Accelerating BLAS on Custom Architecture through Algorithm-Architecture Co-design

نویسندگان

  • Farhad Merchant
  • Tarun Vatwani
  • Anupam Chattopadhyay
  • Soumyendu Raha
  • S. K. Nandy
  • Ranjani Narayan
چکیده

Basic Linear Algebra Subprograms (BLAS) play key role in high performance and scientific computing applications. Experimentally, yesteryear multicore and General Purpose Graphics Processing Units (GPGPUs) are capable of achieving up to 15 to 57% of the peak performance at 65W to 240W of power respectively in underlying platform for compute bound operations like Double/Single Precision General Matrix Multiplication (XGEMM) while for bandwidth bound operations like Single/Double precision Matrix-vector Multiplication (XGEMV) it is merely 5 to 7% respectively. Achieving performance for BLAS requires moving away from conventional wisdom and evolving towards customized accelerator tailored for BLAS. In this paper, we present acceleration of Level-1 (vector operations), Level-2 (matrix-vector operations), and Level-3 (matrix-matrix operations) BLAS through algorithm architecture co-design on a Coarse-grained Reconfigurable Architecture (CGRA). We choose REDEFINE CGRA as a platform for our experiments since REDEFINE can be adapted to support domain of interest through tailor-made Custom Function Units (CFUs). For efficient sequential realization of BLAS, we present a design of a Processing Element (PE) that can achieve up-to 74% of the peak performance of the PE for DGEMM, 40% for DGEMV and 20% for DDOT. We attached this PE to the REDEFINE CGRA as a CFU and show the scalibilty of our solution. Finally, we show performance improvement of 3-140x in PE over commercially available Intel micro-architectures, ClearSpeed CSX700, FPGA, and Nvidia GPGPUs.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Accelerating BLAS and LAPACK via Efficient Floating Point Architecture Design

Basic Linear Algebra Subprograms (BLAS) and Linear Algebra Package (LAPACK) form basic building blocks for several High Performance Computing (HPC) applications and hence dictate performance of the HPC applications. Performance in such tuned packages is attained through tuning of several algorithmic and architectural parameters such as number of parallel operations in the Directed Acyclic Graph...

متن کامل

Porosity Rendering in High-Performance Architecture: Wind-Driven Natural Ventilation and Porosity Distribution Patterns

Natural ventilation is one of the most essential issues in the concept of high-performance architecture. The porosity has a lot to do with wind-phil architecture to meet high efficiency in integrated architectural design and materialization a high-performance building. Natural ventilation performance in porous buildings is influenced by a wide range of interre...

متن کامل

Recognizing the Role of Idea and Concept in Understanding and Creation in Architecture Relying on the "Four Causes"

Today, the increasing realities that have occupied architects in other fields related to architecture, have caused the designerchr('39')s attention to deviate from the theoretical thinking that was considered at the beginning of the design process. Architectural software has expanded the visual dimensions of the human mind and created the conditions for the designerchr('39')s thinking to be lim...

متن کامل

An Integrated Software Environment to Design Polymorphic Fault Tolerant Processors on Radiation Hardened FPGAs

1. A comprehensive literature review has been conducted to understand the current status of on-board autonomous mission planning based iterative repair algorithms, application-specific processor techniques, and use of FPGAs as on-board computers in the space environment. 2. An application-specific hardware architecture (pipelined processor) has been designed and developed for accelerating Itera...

متن کامل

Capability Analyzing of Solar Energy Based on Climatic Criteria Recognition in Iran’s Architectural Design by the Use of Fuzzy Analytical Hierarchy Process Method (FAHP)

Developing a comprehensive document based on the utmost use of renewable energy efficiency in the architecture design is the first step in national level to follow the goals of sustainable architecture and this is not possible without having a deep trend of the climatic compartment. The modeling of comprehensive energy plans in the architecture without having a quantitative approach is incomple...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1610.06385  شماره 

صفحات  -

تاریخ انتشار 2016